Revisiting the Spatial and Temporal Modeling for Few-Shot Action Recognition
نویسندگان
چکیده
Spatial and temporal modeling is one of the most core aspects few-shot action recognition. Most previous works mainly focus on long-term relation based high-level spatial representations, without considering crucial low-level features short-term relations. Actually, former feature could bring rich local semantic information, latter represent motion characteristics adjacent frames, respectively. In this paper, we propose SloshNet, a new framework that revisits for recognition in finer manner. First, to exploit features, design fusion architecture search module automatically best combination features. Next, inspired by recent transformer, introduce model global relations extracted appearance Meanwhile, another encode between frame representations. After that, final predictions can be obtained feeding embedded spatial-temporal common frame-level class prototype matcher. We extensively validate proposed SloshNet four datasets, including Something-Something V2, Kinetics, UCF101, HMDB51. It achieves favorable results against state-of-the-art methods all datasets.
منابع مشابه
A Generative Approach to Zero-Shot and Few-Shot Action Recognition
We present a generative framework for zero-shot action recognition where some of the possible action classes do not occur in the training data. Our approach is based on modeling each action class using a probability distribution whose parameters are functions of the attribute vector representing that action class. In particular, we assume that the distribution parameters for any action class in...
متن کاملSpatial-Temporal Trend Modeling for Ozone Concentration in Tehran City
Fitting a suitable covariance function for the correlation structure of spatial-temporal data requires de-trending the data. In this article, some potential models for spatial-temporal trend are presented. Eventually the best model will be announced for de-trending tropospheric ozone concentration data for the city of Tehran (Capital city of Iran). By using the selected trend model, some ...
متن کاملRepresenting Pairwise Spatial and Temporal Relations for Action Recognition
The popular bag-of-words paradigm for action recognition tasks is based on building histograms of quantized features, typically at the cost of discarding all information about relationships between them. However, although the beneficial nature of including these relationships seems obvious, in practice finding good representations for feature relationships in video is difficult. We propose a si...
متن کاملMining Spatial and Spatio-Temporal ROIs for Action Recognition
of the Thesis Mining Spatial and Spatio-Temporal ROIs for Action Recognition
متن کاملExploring Alternative Spatial and Temporal Dense Representations for Action Recognition
The automatic analysis of video sequences with individuals performing some actions is currently receiving much attention in the computer vision community. Among the different visual features chosen to tackle the problem of action recognition, local histogram within a region of interest is proven to be very effective. However, we study for the first time whether spatiograms, which are histograms...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i3.25403